perm filename SLR1.JP[UP,DOC] blob
sn#512363 filedate 1980-05-29 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00007 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 How to use the SLR1 parser generator
C00005 00003 A sample input grammar
C00009 00004 SLR1's output
C00013 00005 Semantics routines
C00016 00006 SEM : The semantic routines
C00022 00007 The default scanner
C00036 ENDMK
C⊗;
How to use the SLR1 parser generator
The following is a sample run of SLR1. Notice that there are facilities
for checking the grammar before generating the parser. If you enter E on
the grammar there is no way as of yet to return from E directly to the
program. The input grammar appears in the following page.
The source SLR1 parser generator is in SLR1.SAI[CSP,SYS], and the runtime
support for the generated parser is in SLR1A.SAI[CSP,SYS].
.r slr1
Production file? gram
Productions to be Listed? --Terminate With <crlf>: <assignment>
The Productions are:
<ASSIGNMENT>--><LEFTSIDE> := <AE>
Productions to be Listed? --Terminate With <crlf>: <leftside>
The Productions are:
<LEFTSIDE>-->ID
Productions to be Listed? --Terminate With <crlf>:
Do you want to see the symbol table? y
ID = 1
NUM = 2
REAL = 3
STRING = 4
:= = 5
+ = 6
- = 7
* = 8
/ = 9
↑ = 10
( = 11
) = 12
Want to edit grammar?
Start Symbol? <assignment>
Generated 29 production states.
Single reduction eliminated: 3
Unoptimized number of states: 30
optimized number of states: 27
Do you want a dump of the relational graph? n
File Name for Generated Parser? arith.sai
Generating parsing actions... done!
End of SAIL execution
↑C
.
Notice that negative answers to questions can be given by typing CR-LF.
The relational graph of SLR1 items is always output to file DUMP.PDL in
the alias directory. All other files are defined by the luser. If lookup
for a file fails the loser will be reprompted. SLR1 accepts full file
names.
A sample input grammar
The following is an SLR1 grammar for arithmetic expressions (for further
info on SLR1 grammars see Aho & Ullman Vol II), in the input format required
by the parser generator.
ε ID NUM REAL STRING $
6 $<ASSIGNMENT>--> <LEFTSIDE> := <AE> $
<AE>--><SAE> $
<SAE>--><TERM> $
0 $<SAE>--><SAE> + <TERM> $
1 $<SAE>--><SAE> - <TERM> $
<TERM>--> <FACTOR> $
<TERM>--> <FACTOR1> $
3 $<TERM>--><TERM> * <FACTOR> $
4 $<TERM>--><TERM> / <FACTOR> $
<FACTOR>--> <PRIMARY> $
<FACTOR1>--> <PRIMARY> $
2 $<FACTOR>--><FACTOR> ↑ <FACTOR1> $
5 $<PRIMARY>-->ID $
5 $<LEFTSIDE>--> ID $
8 $<PRIMARY>-->NUM $
7 $<PRIMARY>-->REAL $
9 $<PRIMARY>-->STRING $
$<PRIMARY>-->( <AE> )$
The format for the input grammar should be apparent from the above.
Non-terminal ≡ <..garbagechars..> (exception: <> is a terminal!)
Also notice that chars between brackets cannot be
brackets themselves! BEWARE!!
Terminal ≡ ..any sequence of chars which is not a nonterminal..
Production ≡ { integer $ } non-terminal --> ..rhs.. $
Notice that the production prefix is optional. If it appears, it
associates the production with a semantic routine to be applied when
production succeeds (see section on semantics). Notice that the ending $
is necessary to signal the end of a production (yes, you do have to type
--> !!!). The first line of a grammar SHOULD always appear. It binds
distinguished terminals in the grammar with the default scanner syntax
classes. The binding is positional with the first terminal appearing being
bound to syntactic class 0. Notice that this only applies to the default
scanner. For the daring few who provide their own scanner a different
syntax class convention may apply. Nevertheless SLR1 is equipped with this
binding feature and it must be used in any luser provided scanner.
The current syntax class assignment (and the associated token syntax) is
0 → epsilon (null terminal)
1 → identifier (alphanumeric)
2 → integer
3 → real (nnnn.mmmmmEsll where s = + or -
nnnn cannot be null
ll is the exponent)
4 → string ('...any chars...')
to include a ' in a string use ''.
Comments can be typed in as "comment-text" and interspersed freely in the source
between syntactic elements (not within!).
SLR1's output
The output of SLR1 comes in two flavors. One is the parser, relatively
efficient although not optimized (an optimization effort is done to remove
single productions to cut down on the number of states). The other is the
optional dump of the relational graph and LR items. This is very useful
specially for checking for SLR1 bugs, and that the input grammar satisfy
the SLR1 test and not be ambiguous. The one YOU are really interested in
is the parser. The parser uses three load modules. Two of them have to be
provided always by the user, and the third is provided by the system. This
one is the scanner. Notice that you can also provide the scanner by
suitable modification of the "require" statement in the generated parser.
The scanner is slr1a.rel[1,3], and the semantic processors (ie the other 2
files) are proces and sem in the luser's alias directory. THEY MUST ALWAYS
BE PROVIDED (...unless you hack your generated parser not to need them).
Using your new parser
Compile and load it. When started it will prompt you with a *. You have
two options here. Either type in a file name, and parser will read the
program to be parsed from the corresponding file or type a CR-LF and the
parser will accept input from the tty. EOF is signalled by either ↑Z in
ttys, ⊗lf in DDs or edit-null-edit-lf in Datamedias.
Who to complain to
If you have any suggestions, comments, bugs or want some more info see
either JP or JLS, if we are still around.
Error detection and recovery
No attempt to deal with automatic generation of error detection and
recovery routines has been made. Nevertheless, it should be rather
straight-forward to add these into the generated parser. Some ideas can be
found in Aho and Ullman and in the proceedings of the last ACM conference
(1976) at Houston. At present the only facility provided is the flagging
of errors and the message SYNTAX ERROR ENCOUNTERED (MLISP2 lusers: don't
you love this message?). Suggestions welcome.
Semantics routines
This section gives a brief overview on how to write the semantic routines
for your new parser. The following is a listing of files SEM and PROCES
for the sample grammar given in the preceding section. Notice the needed
internal and external variables needed for communication with the rest of
the system. SEM routines are invoked as reductions are done. PROCES is
only invoked after the whole input string has been recognized by the
parser, and as such could be used for generating code or loading other
passes for your translator, or could conceivably not be needed at all, for
instance in an interpreter. The PROCES given below is just the skeleton of
what you could write, since it is just a dummy program for the assignment
grammar given before. Comments to the global data structures and
interfaces are given after the listings.
PROCES.SAI: The postprocessor skeleton
entry;
begin "PROCESS"
define cl = "('15&'12)",
# = "COMMENT";
external integer array
polstack[1:100]; # The parse stack.;
external integer
polpointer; # The parse stack pointer;
external procedure printpolstack; # prints the parse stack;
internal procedure process; # called from the generated parser;
begin
printpolstack; # In this case just prints the parse stack;
polpointer←0; # and resets the stack pointer;
end;
end "PROCESS";
SEM : The semantic routines
Again the following is a sample skeleton for the semantic routines for
the assignment grammar given before. Notice that no attempt is being made
here of interpreting or translating the expressions.
entry;
begin "SEMANTICS"
define
cl = "('15&'12)",
# = "COMMENT";
external integer procedure symtabentry(string str);
external integer index,intval;
external real realval; # scanned real is stored here!;
external string strval; # scanned string store here by scanner.;
external string array name[0:310]; # symbol table;
internal record_pointer (any_class) array polstack [0:100];
internal integer polpointer;
# The following are the parse stack records. Tag = 1 => Id
# Tag = 2 => Integer, Tag = 3 => Real, Tag = 4 => String
# Realentry is for type 3. Intentry for the others;
record_class intentry (integer tag, val);
record_class realentry (integer tag; real val);
record_class strentry (integer tag; string val);
# Initialize the polish stack.;
internal simple procedure initsem;
begin
polpointer ← 0;
end;
# This procedure prints the current state of the parsestack;
internal procedure printpolstack;
begin "PRINTPOLSTACK"
integer jj;
outstr(cl&"Parse Stack:"&cl);
for jj←polpointer step -1 until 1 do
begin
print(jj,": ");
case (intentry:tag[polstack[jj]]-1) of
begin
print(name[intentry:val[polstack[jj]]],cl);
print(intentry:val[polstack[jj]],cl);
print(realentry:val[polstack[jj]],cl);
print("""",strentry:val[polstack[jj]],"""",cl)
end;
end;
end "PRINTPOLSTACK";
internal procedure semantics(integer rule);
begin "SEM"
integer idindex; "here is what is pushed when an id or op is going to be dumped"
procedure pushpol(integer tag1(1));
begin "PUSHPOL"
record_pointer (any_class) p;
if tag1 = 3 then "push a real"
begin
p ← new_record(realentry);
realentry:tag[p] ← tag1;
realentry:val[p] ← realval;
end
else
if tag1 = 4 then "push a string"
begin
p ← new_record(strentry);
strentry:tag[p] ← tag1;
strentry:val[p] ← strval;
end
else begin
p ← new_record(intentry);
intentry:tag[p] ← tag1;
case (tag1-1) of
begin
intentry:val[p] ← idindex;
intentry:val[p] ← intval
end;
end;
polstack[polpointer ← polpointer+1] ← p
end "PUSHPOL";
# The number you give as a prefix of a production is really an index into
this case statement. Here we are only pusshing objects onto the stack;
case rule of
begin
begin "+"
idindex ← symtabentry("+");
pushpol(1);
end "+";
begin "-"
idindex ← symtabentry("-");
pushpol(1);
end "-";
begin "↑"
idindex ← symtabentry("↑");
pushpol(1);
end "↑";
begin "*"
idindex ← symtabentry("*");
pushpol(1);
end "*";
begin "/"
idindex ← symtabentry("/");
pushpol(1);
end "/";
begin "id"
idindex ← index;
pushpol(1);
end "id";
begin ":="
idindex ← symtabentry(":=");
pushpol(1);
end ":=";
begin "REAL"
pushpol(3)
end "REAL";
begin "NUM"
pushpol(2)
end "NUM";
begin "STRING"
pushpol(4);
end "STRING"
end;
return;
end "SEM";
end "SEMANTICS";
The default scanner
The following is the default scanner for generated parsers.
entry;
begin "ACCEPT"
define
crlf = "('15&'12)",
# = "comment",
unknown = "999",
symtabsize = "311",
symtabsizem1 = "(symtabsize-1)",
hash(x) = "(((cvasc(0&x)) mod symtabsizem1)+1)";
# Notice that polstack has the associated poltag stack in which
# stack entries are tagged;
external record_pointer (any_class) array polstack[1:100];
external integer polpointer,parse,parseflag;
# Notice that token and itval are the interface with the slr1 parser.;
external integer count, token;
external simple procedure push(integer astate);
internal integer inchan, brchar, flag, index;
# our string space;
# The following three vars hold scanned values;
internal real realval;
internal integer intval;
internal string strval;
internal string bstr, inputstr;
internal real array value[0:symtabsizem1];
internal integer array type[0:symtabsizem1];
internal string array name[0:symtabsizem1];
real int;
integer jj, indx;
string t;
forward internal integer procedure symtabentry(string x);
# Some useful procedures;
simple procedure mylop(reference string s);
# Eliminate the first char of s;
begin
integer i;
i ← lop(s);
end;
simple procedure deblank(reference string s;integer i);
# Eliminate all i's from the begining of s;
begin "DEBLANK"
while s = i do mylop(s);
end "DEBLANK";
internal integer procedure symtabentry(string x);
begin "SYMTABENTRY"
integer i,j,k;
k ← 0;
j ← (i ← hash(x));
while ¬(length(name[i])=0 ∨ equ(name[i],x)) do
begin
if equ(name[i]," ") ∧ k=0 then k ← i;
i ← (i+j) mod symtabsize;
end;
if (i=0) ∧ (k=0) then outstr("Symtab Full"&crlf);
if length(name[i])=0 then
begin
if k then i ← k;
name[i] ← x;
type[i] ← unknown;
value[i] ← '400000000001;
end;
return(i);
end "SYMTABENTRY";
internal procedure initsymtab(string str);
begin "INITSYMTAB"
string field1, field2;
integer inch, flag, flag1, symcount, brchar;
procedure error;
begin
outstr("Can't happen -- Strange symbol table!"&crlf);
call(0,"Exit");
end;
setbreak(3,"$",'11&'40,"isk");
setbreak(4,"0123456789"&'11&'40,'11&'40,"xsk");
setbreak(5,'11,null,"insk");
symcount ← 0;
do begin
field1 ← scan(str,3,brchar);
if ¬(brchar = "$") then error;
field2 ← scan(str,4,brchar);
if ¬(brchar = '15) then error;
index ← symtabentry(field1);
type[index] ← cvd(field2);
name[index] ← field1;
symcount ← symcount + 1;
scan(str,5,brchar);
end
until str = null;
end "INITSYMTAB";
procedure scanstring;
begin "scanstring"
string s1;
integer b;
# skip first char which must be ';
mylop(bstr);
s1 ← null;
while true do
begin
s1 ← s1&scan(bstr,7,b);
if (b='15) ∨ (b=null) then
" Notice horrendous fixup for reading from TTY instead of DSK:
This is needed since activation char might have thrown in line editor
and line will not end in CRLF!!"
begin
s1 ← s1&crlf;
# get a new line;
inputstr ← bstr ← input(parse,6);
if parseflag then
begin
outstr("Unexpected EOF. String delimiter missing."&crlf);
done;
end;
end
else if bstr = "'" ∧ b = "'" then
begin s1 ← s1&"'"; mylop(bstr) end
else done;
end;
token ← 4;
strval ← s1;
end "scanstring";
procedure scanquotes;
begin "scanquotes"
integer b;
# skip first char which must be ";
# the last char must also be ";
mylop(bstr);
while true do
begin
scan(bstr,9,b);
if (b='15) ∨ (b=null) then "notice extremely unclean code for TTY hack"
begin
# get a new line;
inputstr ← bstr ← input(parse,6);
if parseflag then
begin
outstr("Unexpected EOF. Comment delimiter missing."&crlf);
done;
end;
end
else done;
end;
end "scanquotes";
procedure scancomment;
begin "scancomment"
# skip first char which must be ".....comment.....";
# last char must be sc;
bstr ← bstr[8 to ∞];
while true do
begin
while (bstr≠null) ∧ (bstr≠'15) ∧ (bstr≠";") do mylop(bstr);
if (bstr='15) ∨ (bstr=null) then "notice extremely unclean code for TTY hack"
begin
# get a new line;
inputstr ← bstr ← input(parse,6);
if parseflag then
begin
outstr("Unexpected EOF. Comment delimiter missing."&crlf);
done;
end;
end
else begin mylop(bstr); done; end;
end;
end "scancomment";
procedure scannum;
begin "scannum"
string s1;
real r;
integer b,f,l;
simple procedure error (string str);
# Send an error msg;
begin
outstr(str&crlf&" input string : "&s1)
end;
s1 ← scan(bstr,8,b);
if length(s1) > 11 then begin
error("Number too long. Truncated to 11 digits.");
s1 ← s1[1 to 11];
end;
l ← cvd(s1);
if bstr = "." then "Could be a real!"
begin
mylop(bstr);
if bstr = "." then "...sorry! it was an integer."
begin
bstr ← "."&bstr; "Yea kid. Put it back."
token ← 2;
intval ← l;
return;
end;
"OK. Now we really know its a real"
r ← l;
if ("0" ≤ bstr ≤ "9") then
"here comes the fractional part.."
begin
s1 ← scan(bstr,8,b);
if length(s1) > 11 then
begin
error("Fractional part too long. Truncated to 11 digits.");
s1 ← s1[1 to 11];
end;
r ← r + cvd(s1)/(10↑length(s1));
end;
"Look for an exponent in E notation"
if bstr = "E" then
begin
mylop(bstr);
if (bstr = "+") ∨ (bstr = "-") ∨ ("0" ≤ bstr ≤ "9") then
"sure, there is an exponent!"
begin
f ← (bstr = "-");
if f ∨ (bstr="+") then mylop(bstr);
s1 ← scan(bstr,8,b);
if abs(log(r) + cvd(s1) * (if f then -1 else 1)) > 35 then
begin
error("Exponent too long. Eliminated..");
s1 ← null;
end
else
r ← r*10↑(cvd(s1)*(if f then -1 else 1));
end
else bstr ← "E" & bstr;
end;
token ← 3; realval ← r;
end
else begin
token ← 2; intval ← l;
end;
end "scannum";
procedure scanidop (string t);
# The identifier and operator scanning routine;
# First param is id or op string;
begin "scanidop"
indx ← symtabentry(t);
if type[indx]=unknown then
begin
token ← type[indx] ← 1;
index ← indx;
end
else begin
if type[indx]=1 then index ← indx;
token ← type[indx];
end;
end "scanidop";
internal procedure shift(integer i);
# The shift procedure. Advances token stream by one token. Leaves in
# token the type (0=EOF,1=ID,2=INT,3=REAL,4=STRING,>4= reserved word)
# and in INTVAL, REALVAL, STRINGVAL the corresponding scanned value;
begin "shift"
push(i);
token ← 0;
while ¬ parseflag do
begin
while ¬((bstr = '15) ∨ (bstr = null)) do
begin
while (bstr='40) ∨ (bstr='11) do mylop(bstr);
" Now try to see if a number follows "
if ("0" ≤ bstr ≤ "9") then
begin scannum; return; end
else if bstr = "'" then begin scanstring; return end
else if bstr = """" then begin scanquotes; continue end
else if equ(bstr[1 to 7],"COMMENT") then
begin scancomment; continue; end
else begin
if ("A" ≤ bstr ≤ "Z") then
scanidop(scan(bstr,2,brchar))
else begin
t ← lop(bstr);
# bletch!;
if ((t=":" ∨ t="<" ∨ t=">") ∧ bstr="=") ∨
(t = ":" ∧ bstr="=") ∨
(t = "(" ∧ (bstr = "." ∨ bstr = ":"))∨
(t = ")" ∧ (bstr = "." ∨ bstr = ":"))∨
(t = "." ∧ bstr=".")
then t ← t&lop(bstr);
scanidop(t);
end;
return;
end;
end;
inputstr ← bstr ← input(parse,6);
if parseflag then begin token ← 0; return end;
end;
end "shift";
internal procedure breakinit;
begin
setbreak(2,"ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",null,"xrk");
setbreak(6,'12,'13&'14,"inak");
setbreak(7,"'"&'15,null,"insk");
setbreak(8,"0123456789",null,"xrk");
setbreak(9,""""&'15,null,"insk");
setbreak(10,"=.",null,"xrk");
end;
end "ACCEPT";